o 

Oh. 
< 



X 



A proximal approach to the inversion of ih-conditioned 



'd\'- matrices 
o 



Pierre Marechal* Aude Rondepierre^ 



Abstract 

' We propose a general proximal algorithm for the inversion of ill-conditioned ma- 

' trices. This algorithm is based on a variational characterization of pseudo-inverses. 

r~| . We show that a particular instance of it (with constant regularization parameter) 

I belongs to the class of fixed point methods. Convergence of the algorithm is also 

discussed. 



> ■ 1 Introduction 
cn 

Inverting ill-conditioned large matrices is a challenging problem involved in a wide range of 
O ' applications, including inverse problems (image reconstruction, signal analysis, etc.) and 
Tj- ■ partial differential equations (computational fluid dynamics, mechanics, etc.). There are 
^ . two classes of methods: the first one involves factorization of the matrix (SVD, QR, LU, 
! LQUP); the second one involves iterative schemes (fixed point methods, projection onto 
increasing sequences of subspaces). 

The main purpose of this note is to show that a particular instance of the Proximal 
Point Algorithm provides a fixed point method for the problem of matrix inversion. This 
, fact is based on the observation that the pseudo-inverse Mt of a matrix M e R™''" satisfies 
the Gxed point equation: 

$ = ^($) := B$ + with B := {I + fiM^ M)-^ and C := {M^ M + fi'^ I)-^ 

where /i > 0. The corresponding Gxed point iteration ^k+i = + C is nothing but a 
proximal iteration. We see that ip is a contraction and that, if M is positive definite, 
then (y9 is a strict contraction. It is worth noticing that, in the proximal algorithm, /i may 
depend on k, allowing for large (but inaccurate) steps for early iteration and small (but 
accurate) steps when approaching the solution. 
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The Proximal Point Algorithm (PPA) was introduced in 1970 by Martinet [5], in the 
context of the regularization of variational inequalities. A few years later, Rockafellar [6] 
generalized the PPA to the computation of zeros of a maximal monotone operator. Under 
suitable assumptions, it can be used to efficiently minimize a given function, by finding 
iteratively a zero in its Clarke subdifferential. 

Throughout, we denote by ||M||^ the Frobenius norm of a matrix M G R™^" and by 
{M,N)p the Frobenius inner product of M,N E j^mxn (^^]^[q[^ jg given by {M,N)f = 
tT{MN^) = tT{N^M)). In R™^", we denote by distF{M,S) the distance between a 
matrix M and a set iS: 



The identity matrix will be denoted by /, its dimension being always clear from the context. 

The next theorem, whose proof may be found e.g. in [1], provides a variational char- 
acterization of 

Theorem 1.1 The pseudo-inverse of a matrix M G R™-^" is the solution of minimum 
Frobenius norm of the optimization problem 



2 The proximal point algorithm 

The proximal point algorithm is a general algorithm for computing zeros of maximal mono- 
tone operators. A well-known application is the minimization of a convex function / by 
finding a zero in its subdifferential. In our setting, it consists in the following steps: 

1. Choose an initial matrix $o ^ R™^"; 

2. Generate a sequence ($fc)fc>o according to the formula 



in which {fik)k>o is a sequence of positive numbers, until some stopping criterion is 
satisfied. 

Equation ([1]) will be subsequently referred to as the proximal iteration of Problem (V) . 
The stopping criterion may combine, as usual, conditions such as 



distF(M, S) 



inf{||M-M'||F|M' G S}. 




over R* 




(1) 



||V/(<I>fe)||F < £i and - $fe_i||F < £2, 



where the parameters ei and 62 control the precision of the algorithm. 
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Clearly, the function / : $ i— ||A'f$ — /|||,/2 is convex and indefinitely differentiable. 
Therefore, solutions of the proximal iteration ([1]) are characterized by the relationship 

{I + fikM^M)<!?k+i = ^k + f^kM^. (2) 

Since M is positive semi-definite and /ifc is chosen to be positive for all k, the matrix 
(/ + fikM~^ M) is nonsingular and the proximal iteration also reads: 

= (/ + fikM^M) {<l>k + /ifcM"^) . (3) 

The iterates $fc could be computed either exactly (in the ideal case), or approximately, 
using e.g. any efficient minimization algorithm to solve the proximal iteration ([1]). In that 
case, we need another stopping criterion and we here choose the following one suggested 
in [1]: 

||$fc+i-A(<l>fc + /XfcM^)||ir<efcmin{l,||<l>fc+i-$fc||^}, r>l (4) 

where > and the series is convergent. Notice that the larger r, the more accurate 
the computation of ^k+i- Notice also that, in the case where /x^ = for all fc, each 
proximal iteration involves the multiplication by the same matrix A := {I + fiM~^ M)~^ , 
and that the latter inverse may be easy to compute numerically, if the matrix / + /xM^M 
is well-conditioned. 

We now turn to convergence issues. Recall that our objective function / is a quadratic 
function whose Hessian M is positive semi-definite. Nevertheless, unless M^M is 
positive definite, the matrix / — (/ + ^M^ M)~^ is in general singular and the classical 
convergence theorem for iterative methods (see e.g. |2j) is not helpful here to prove the 
convergence of our proximal scheme. The following proposition is a consequence of Theo- 
rem 2.1 in [1]. For clarity, we shall denote by M. the linear mapping $ ^ M$, by C the 
linear mapping $ v-^ M^M$ and by A the linear mapping $ i-^ A$ = (/ + /iM^M)^^$. 

Proposition 2.1 Let ai he the smallest nonzero eigenvalue of C and let Ei be the corre- 
sponding eigenspace. Assume that fik = fJ' for all k and that $o is not (-, ■) p- orthogonal to 
the eigenspace Ei . Then, 

— TT^ ^-n ^ and -— — i > Wi as k ^ oo, 

\\^k+l-^k\\F l + ttl/i 

in which is a unit eigenvector in Ei. Moreover the sequence {^k) generated by the 
proximal algorithm, either with infinite precision or using the stopping criterion (|4]) for 
the inner loop, converges linearly to the orthogonal projection of $o onto the solution set 
S : = argmin / = ikfl" + ker Ai . 

Proof. Step 1. The error A^+i := ^k+i - (at iterate k + 1) satisfies: A^+i = (/ + 
/iM^M)~^Afc. The latter iteration is that of the power method for the linear mapping A. 
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Clearly, A is symmetric and positive definite. Consequently, its eigenvalues are strictly 
positive and there exists an unique eigenvalue of largest modulus (not necessarily simple). 

Notice now that the eigenspace Ei associated to the eigenvalue ai is nothing but the 
eigenspace of A for its largest eigenvalue strictly smaller than 1, namely, 1/(1 + cti/x). We 
proceed as in [71 Theorem 1] to obtain the desired convergence rate via that of the iterated 
power method. 

Step 2. We now establish the linear convergence of the sequence ($fc). First, the 
solution set S is clearly nonempty since it contains Moreover, let ($fc) be a sequence 
generated by the PPA algorithm using the stopping criterion (jl]). Let us prove that 



3a > 0, 35 > 0, V$ G R" 



||V/($)||^ < S dist^(<l>,5) < a||V/($)||i. , (5) 



which is nothing but Condition (2.1) in [H Theorem 2.1], in our context. Let $ G R™'^" 
and let $ be the orthogonal projection of over (ker A^)-*-. It results from the classical 
theory of linear least squares that distil ($,5) = ||<l-M"l"||ir. Since M'^MI>-M'^ = V/($) 
and M^MMt - = 0, we also have: V/($) = M^M(<I - M^). Moreover, $ - g 
(keiM)^ = (ker/:)^, so that 

||V/($)||f = ||M^M(<i - M^)\\f > ai||<i - M^\f. 

It follows that ([5]) is satisfied with a = \ja\. The conclusion then follows from |3l The- 
orem 2.1]: the sequence ($fc) converges linearly with a rate bounded by a/ \Ja? + ij? = 



1/^/1 +fi^al < 1. 

Step 3. By rewriting the proximal iteration in an orthonormal basis of eigenvectors 
of C, we finally prove that the limit of the sequence ($fc) is the orthogonal projection of 
$0 onto argmin/. ■ 

A complete numerical study, which goes beyond the scope of this paper, is currently in 
progress and will be presented in a forthcoming publication. Let us merely mention that 
our proximal approach makes it possible to combine features of factorization methods (in 
the proximal iteration) with features of iterative schemes. In particular, if M if invertible, 
it shares with iterative methods the absence of error propagation and amplification, since 
each iterate can be regarded as a new initial point of a sequence which converges to the 
desired solution. 



3 Comments 

Tikhonov approximation. A standard approximation of the pseudo-inverse of an ill- 
conditioned matrix M is {M~^ M + eI)~^M^ , where e is a small positive number. This 
approximation is nothing but the Tikhonov regularization of M\ with regularization pa- 
rameter e. It is worth noticing that the choice $o = in the proximal algorithm yields the 
latter approximation for e = 1/fi after one proximal iteration. 

Trade-offs. At the A;-th proximal iteration, the perturbation of the objective function / 
is, roughly speaking, proportional to the square of the distance between the current iterate 
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and the solution set of (V), and inversely proportional to /i^. In order to speed up the 
algorithm, it seems reasonable to choose large /i^ for early iterations, yielding large but 
inaccurate steps, and then smaller fik for late iterations, where proximity with the solution 
set makes it suitable to perform small and accurate steps. This is especially true in the 
case where M is invertible, since the solution set then reduces to {M~^}. Moreover, 
numerical accuracy in early proximal iteration may be irrelevant, since the limit of the 
proximal sequence is what really matters. A trade-off between a rough approximation 
of the searched proximal point and an accurate and costly solution must be found. As 
suggested in |3], one may use the following stopping criterion for the proximal iteration: 

- < 5(V/(<l>fc+i), $^+1 - <i>k)F. 

This criterion is an Armijo-like rule: the algorithm stops when the improvement of the 
objective function / is at least a given fraction 6 G (0, 1) of its ideal improvement. 

Inversion versus linear systems. It is often unnecessary to compute the inverse of 
a matrix M, in particular when the linear system Mx = d must be solved for a few 
data vectors d only. In such cases, of course, the usual proximal strategy may be used to 
compute least squares solutions. It is important to realize that, although the regularization 
properties of the proximal algorithm are effective at every proximal iteration, perturbations 
of d may still have dramatic effects on the algorithm if M is ill-conditioned. In applications 
for which no perturbation of the data must be considered, accurate solutions may be 
reached by a proximal strategy. We emphasize that, in the minimization of $ i-^ ||M$ — 
I\\f, the data I undergoes no perturbation whatsoever. 
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